Multi-level Fusion of Audio and Visual Features for Speaker Identification

نویسندگان

  • Zhiyong Wu
  • Lianhong Cai
  • Helen M. Meng
چکیده

This paper explores the fusion of audio and visual evidences through a multi-level hybrid fusion architecture based on dynamic Bayesian network (DBN), which combines model level and decision level fusion to achieve higher performance. In model level fusion, a new audio-visual correlative model (AVCM) based on DBN is proposed, which describes both the intercorrelations and loose timing synchronicity between the audio and video streams. The experiments on the CMU database and our own homegrown database both demonstrate that the methods can improve the accuracies of audiovisual bimodal speaker identification at all levels of acoustic signal-to-noiseratios (SNR) from 0dB to 30dB with varying acoustic conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification

This paper investigates the estimation of fusion weights under varying acoustic noise conditions for audio-visual multi-level hybrid fusion strategy in speaker identification. The multi-level fusion combines model level and decision level fusion via dynamic Bayesian networks (DBNs). A novel methodology known as support vector regression (SVR) is utilized to estimate the fusion weights directly ...

متن کامل

The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMMs

This paper investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speakerdependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based arou...

متن کامل

Likelihood Ratio Based Score Fusion for Audio-Visual Speaker Identification in Challenging Environment

It is well known to enhance the performance of noise robust speaker identification using visual speech information with audio utterances. This paper presents an approach to evaluate the performance of a noise robust audio-visual speaker identification system using likelihood ratio based score fusion in challenging environment. Though the traditional HMM based audio-visual speaker identification...

متن کامل

Hybrid Feature and Decision Fusion Based Audio-Visual Speaker Identification in Challenging Environment

The contribution of this paper is to propose a novel approach of evaluating the performance of a noise robust audio-visual speaker identification system in challenging environment. Though the traditional HMM based audio-visual speaker identification system is very sensitive to the speech parameter variation, the proposed hybrid feature and decision fusion based audio-visual speaker identificati...

متن کامل

Multifactor Fusion for Audio-Visual Speaker Recognition

In this paper we propose a multifactor hybrid fusion approach for enhancing security in audio-visual speaker verification. Speaker verification experiments conducted on two audiovisual databases, VidTIMIT and UCBN, show that multifactor hybrid fusion involve a combination feature-level fusion of lip-voice features and face-lip-voice features at score-level is indeed a powerful technique for spe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006